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Abstract—In this paper, we will develop a fast direct integral 
equation solver for large-scale electrodynamic analysis. In this 
solver, we significantly reduce the cost of an #(-matrix-based 
computation of electrodynamic problems with prescribed accu- 
racy satisfied, by constructing an efficient matrix algebra-based 
method that minimizes the rank of each admissible block based 
on accuracy; and by developing a new frequency-dependent H- 
partition that minimizes the number of admissible blocks at each 
tree level for each frequency point. We will then develop a fast 
H-matrix-based LU factorization for directly solving the dense 
system matrix resulting from an integral-equation-based analysis 
of large-scale electrodynamic problems. The proposed direct 
solver successfully solves dense matrices that involve more than 
1 million unknowns associated with electrodynamic problems 
of 96 wavelengths in fast CPU time (less than 20 hours in 
LU factorization, 85 seconds in LU solution), modest memory 
consumption, and with the prescribed accuracy satisfied on a 
single CPU running at 3 GHz. As an algebraic method, the 
underlying fast techniques are kernel independent. 


Index Terms—Integral-equation based methods, electrody- 
namic analysis, direct solution, H matrix, large-scale analysis 


I. INTRODUCTION 


He integral equation (IE) based computational electro- 
magnetic methods generally lead to dense systems of 
linear equations. When a direct method is used, the operation 
count is proportional to O(N*) and the memory requirement 
is proportional to O(N), with N being the matrix size. 
When an iterative solver is used, the memory requirement 
remains the same, and the computing time is proportional to 
O(Ni:N?), where Nj, denotes the total number of iterations 
required to reach convergence. The Nj; is, in general, prob- 
lem dependent, solver dependent, and accuracy dependent. 
In recent years, fast solvers such as fast multipole based 
methods [1]-[3] and FFT-based methods [4], [5] have been 
developed that dramatically reduce the memory requirement 
of the iterative IE solvers to O(N), and the CPU time to 
O(NitN log N) for electrodynamic problems. This represents 
an impressive improvement as compared with conventional 
O(N?) or O(N; N?) techniques. 
Fast direct solvers have also been developed for electro- 
dynamic problems. Most recent work can be seen from [6], 
[7]. LU factorization of O(N?) time complexity and O(N +5) 
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memory complexity has been shown by numerical experi- 
ments. Compared to iterative solvers, direct solvers have ad- 
vantages when the number of iterations is large or the number 
of right hand sides is large. For example, if there exist N right 
hand sides, each of which costs O(N; N log N) operations, 
the total cost of the iterative solver will be O(N; N°? log N), 
which is expensive. 

Recently, in [8], [9], we introduced and further developed 
the H- and H?-matrix based mathematical framework for 
reducing the computational cost of iterative IE-based solvers 
for electrodynamic problems. In [10]—[12], we further demon- 
strate that the inverse as well as the LU factorization of a 
dense system matrix resulting from the IE-based analysis of 
static problems or problems with moderate electric sizes can 
be directly obtained in O(N) complexity, for arbitrary 3-D 
structures with non-uniform materials. 

The focus of this work is large-scale electrodynamic prob- 
lems. For this class of problems, we develop a fast direct 
integral equation based solver. In this solver, we significantly 
reduce the computational cost of the H-matrix-based direct 
matrix solution with prescribed accuracy satisfied. We propose 
two methods to reduce the cost of an H-matrix based compu- 
tation. One is to minimize the rank of each admissible block 
based on accuracy requirements. We determine the minimal 
rank for each admissible block by an efficient matrix algebra 
-ased method. The algebraic method has a linear cost for each 
block, and hence the computational overhead is negligible. 
The other method we develop for reducing the computational 
cost is to optimize the H-partition to reduce the number 
of admissible blocks at each tree level based on accuracy 
requirements for each frequency point. We show that the 
admissibility condition used in the conventional #(-partition 
is empirical instead of theoretical, and hence not optimal for 
a given accuracy and a given frequency. We then develop a 
new H-partition method which is frequency dependent, and 
also controlled by accuracy requirements. With the proposed 
new partition, the number of admissible blocks at each tree 
level is significantly reduced compared to that generated by the 
admissibility condition based, i.e. geometry based partition. 
The methods developed in this work for minimizing the rank 
and optimizing the H-partition not only can be used in the 
proposed solver, but also can be used in other fast integral 
equation solvers to speed up their computation. Moreover, 
we develop an efficient 7-matrix-based LU-factorization for 
directly solving the dense system matrix resulting from an 
IE-based analysis of large-scale electrodynamic problems. 
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We also give a detailed implementation of the H-based LU 
factorization, which is not reported elsewhere. 

The basic idea of the proposed work was reported in [21]. In 
this paper, we complete it from both theoretical and numerical 
perspectives. The remainder of this paper is organized as 
follows. In Section II, we give the background of the H- 
matrix based analysis of electrodynamic problems. In Section 
HI, we analyze the storage and the computational cost of the 
H-matrix based computation of electrodynamic problems. In 
Section IV, we propose methods to reduce the cost of the H- 
matrix based computation. In Section V, we give numerical 
proof on the existence of an #-matrix based representation 
of the inverse and LU factors of an IE based system matrix 
for an electrodynamic problem. In Section VI, we present a 
number of pseudo-codes to show a detailed implementation 
of the LU factorization using H matrices. In Section VII, we 
analyze total computational cost. In Section VIII, we present 
numerical results to demonstrate the accuracy and efficiency 
of the proposed direct IE solver. In Section IX, we summarize 
our findings. 


II. BACKGROUND 


The H-matrix based methods are algebraic methods that are 
kernel independent. In the following, we use an electric-field 
integral equation as an example to illustrate the basic concept 
of H-matrix based methods. 


A. Electric Field Integral Equation 


We consider the electric-field integral equation (EFIE) [3], 
[22] 


Eslian = If [jwpFs(r) g(t, Ppa aS" 
= ff JZ ev Ise) vote as’, 0) 


e`irlr=r'| 


in which Green’s function g(r, r’) = 7 > Js is the 
induced surface current density, w is angular frequency, «K is 
wave number, and subscript tan denotes tangential component. 

By expanding the unknown Js using the RWG basis 
functions [22], a Method of Moment based solution of (1) 
results in the following linear system of equations 


GI =V, (2) 


where 
Gmn = i as ff as’ ol E 
- L(y Int) VERE ale), 6 
and 


Va = // Jm(r) -E,dS. (4) 
Sm 


The conventional way to solve (2) could be very expensive, 
since the entries of G are all nonzero. In the following section, 
we introduce the H matrix as a data-sparse representation of 
G, from which a significant reduction in computational cost 
can be achieved. 
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Fig. 1: An illustration of the admissibility condition. 


B. Definition of the H Matrix 


An H matrix is generally associated with an admissibility 
condition [16]. To define an admissibility condition, we de- 
note the whole index set containing the indices of the basis 
functions in the computational domain by Z = {1,2,--- , N}, 
where N is the total number of unknowns. Considering two 
subsets ¢ and s of the Z, the admissibility condition is defined 
as 


True if min{diam(Q:), diam(Q;)} 
(t, s) are admissible = < ndist(Q:, Qs); 
False otherwise, 


(5) 
where, as shown in Fig. 1, Q: s is the minimal subset of the 
space containing the supports of all basis functions belonging 
to t or s, diam/(-) is the Euclidean diameter of a set, dist(-) is 
the Euclidean distance of two sets, and 77 is a positive param- 
eter that can be used to control the admissibility condition. If 
subsets ¢ and s do not satisfy the admissibility condition, they 
are called inadmissible. The admissibility condition shown in 
(5) is empirical rather than theoretical. It is controlled by an 
empirical parameter 7, instead of a prescribed accuracy. In 
this work, we will develop a new method to partition a matrix 
into admissible and inadmissible blocks for electrodynamic 
analysis, which is controlled by accuracy requirements. 

In an H-matrix representation, an inadmissible block keeps 
its original full-matrix representation; while an admissible 
block has a factorized low-rank form. To be specific, an 
admissible block G’* that is formed by subsets ¢ and s can 
be written as a factorized form 


G's = AB’, (6) 


where GoS e C™X", A € C™*X* B e C"** and k €N is 
the rank of G'*. Here, as long as k is less than the minimum 
of m and n, G'S is low rank. The k is not required to be 
O(1). 

If all the blocks G° formed by the admissible (t, s) in G 
can be represented by a factorized low-rank form shown in 
(6), G has an H-matrix representation ( [16], p. 18). Clearly, 
to store admissible G*, we only need to store A and B, the 
cost of which is O(k(m + n)) in contrast with the original 
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Fig. 2: A flow to obtain a rank-k matrix representation of a 
given dense matrix block. 


storage that is O(mn). Similar cost reduction in matrix-vector 
and matrix-matrix multiplication can be achieved. Therefore, 
if we are able to represent the dense matrix resulting from 
an IE based analysis by an H matrix, we can reduce the 
computational cost of IE-based solutions significantly. The 
detail of the H-matrix representation of an IE-based system 
matrix is given in next subsection. 


C. H-Matrix Representation of the IE-Based System Matrix 


The essential idea of an H-matrix representation is to 
represent the off-diagonal matrix block that satisfies the ad- 
missibility condition by a rank-k matrix shown in (6). 

There are three representative schemes that can be used 
to generate a rank-k matrix of an IE-based dense matrix 
block: interpolation, Taylor expansion, and adaptive cross 
approximation (ACA). In [8], [9], an interpolation scheme is 
used to obtain an H-representation without any compression 
cost for electrodynamic kernels. The Taylor expansion can also 
be used to locally replace the kernel function by degenerate 
approximations, as shown in ( [16], p. 10). Neither interpo- 
lation nor Taylor expansion involves rank compression cost. 
However, the rank determined by the two schemes may not 
be a minimal rank required by accuracy. Another approach is 
to directly compute a low rank approximation of the original 
matrix up to a prescribed accuracy. The representative method 
is ACA [16], [19], which is purely algebraic. In [6], [20], ACA 
was used to solve electromagnetic problems. For each matrix 
block, the cost of ACA is linear. A simple flow to obtain the 
H-matrix representation of a given matrix block is shown in 
Fig. 2. 


D. H-Matrix Partition 


To use an H-matrix representation, we have to partition a 
dense matrix into admissible blocks and inadmissible blocks. 
An admissible block is represented by a rank-k matrix shown 
in (6), and an inadmissible block is represented by a full- 
matrix form. 

In [8], [9], we show how to build a cluster tree and a block 
cluster tree to efficiently carry out the H-matrix partition for a 
dense matrix resulting from the IE-based analysis of electro- 
dynamic problems. We denote the cluster tree constructed for 
the full index set Z by Tz. We then find a disjoint partition of 
the index set and use this partition to create children clusters. 
We continue this procedure until the index number in each 
cluster is less than the leafsize which is a parameter to control 


the depth of the tree. A block cluster tree, as shown in Fig. 
3(a), between cluster trees Tz and Tz is constructed based 
on a given admissibility condition recursively. In Fig. 3(a), 
each link in an upper level represents an admissible block 
shown by a shaded block in Fig. 3(b). The number of links is 
bounded by the sparsity constant C'sp, which is the maximum 
number of blocks that can be formed by a cluster in a block 
cluster tree ( [16], p. 125). For each cluster t € Tz , the 
cardinality of the sets col(t) = {s € T7 : (t,s) € Tzx g} and 
row(s) = {t € Tr: (t,s) € Tzx7} is bounded by Cp. 

The admissible block cluster tree results in a matrix par- 
tition as shown in Fig. 3(b). The shaded matrix blocks are 
admissible blocks; the un-shaded ones are inadmissible blocks. 
The admissible blocks and inadmissible blocks together form 
a complete H partition. In Fig. 3(b), we also label the 
levels present in an H partition, which is shown by dashed 
rectangular boxes. 


HI. ANALYSIS OF STORAGE REQUIREMENTS AND 
OPERATION COUNTS FOR H-MATRIX BASED 
COMPUTATION OF ELECTRODYNAMIC PROBLEMS 


In mathematical literature [16], it is shown that stor- 
age requirements and matrix-vector multiplications using H- 
matrices are of complexity O(k.N log N), where k is the upper 
bound of the rank of the admissible blocks, i.e. blocks that 
represent a far-field interaction. Moreover, the inverse and LU 
of an H matrix can be obtained in O(k? N log? N) complexity. 

For frequency independent kernels, a constant rank k across 
all the admissible blocks is sufficient to generate a constant 
order of accuracy for the H-based computation of the dense 
system matrix. This can be theoretically verified from the error 
bound for an H-matrix based representation of a static kernel 
[16]. As a result, the k can be excluded from the complexity 
bounds. Thus, for static problems, the inverse and LU of 
an H matrix can be obtained in O(N log? N) complexity, 
and the storage and matrix-vector multiplications are both of 
complexity O(N log N). 

For frequency-dependent kernels, however, a constant k, in 
general, cannot guarantee a constant order of accuracy across 
a wide range of electric sizes. Since rank k is electric size 
dependent, it becomes a variable that is different in each 
admissible block, especially in admissible blocks at different 
tree levels since each tree level corresponds to a different 
electric size. Traditionally, people use the maximal rank among 
all the blocks, kmaz, to bound the complexity of #-based 
computation of electrodynamic problems. In this paper, we 
show that the cost of an H-matrix based computation of 
electrodynamic problems can be greatly reduced from the 
computational cost obtained based on kmaz. 

This is because first of all, using kmay to bound the 
complexity largely overestimates the cost since many blocks 
have a rank smaller than kmag; secondly, the kmaz-based 
complexity is obtained based on the admissibility condition 
based H-partition that is solely geometry based. By changing 
the H-partition to a frequency-dependent one that is optimized 
based on accuracy, the complexity of H-based computation of 
electrodynamic problems can be reduced. To help understand 
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Fig. 3: (a) A block Cluster Tree. (b) An H-matrix partition. 


the importance of H-partition in determining the actual com- 
putational complexity, we give an example. In Fig. 4, we plot 
the storage of the blocks that have the largest rank between 
0.5kmaxz and kmax in comparison with the storage of the rest 
of the blocks that have a smaller rank, for a cone sphere across 
a wide range of electric sizes from xa = 2 to Ka = 600, where 
k is wave number and a is the largest physical dimension 
of the cone sphere. It is clear that the total storage is not 
dominated by the cost of rank-kmag blocks. In fact, the storage 
of the rank-k,¢, blocks is much less than that of the rest of 
the blocks. This is because although kmaz is the largest, the 
number of admissible blocks that have rank kmax is also the 
smallest, and the cost of an H-matrix based computation is 
determined not only by the rank of each admissible block, 
but also by the number of blocks that can have such a rank. 
In other words, if the weight of each rank-k; block in the 
storage and CPU time cost is the same, then the contribution 
from rank-kmaxs blocks would dominate the storage and CPU 
time cost. However, the weight of each rank-k; block is not 
the same; it is determined by the number of blocks that can 
have such a rank. Since the number of admissible blocks at 
each tree level is determined by the H-matrix partition, the 
H-matrix partition plays an important role in reducing the 
computational cost. One can optimize the H-matrix partition 
to reduce the computational complexity. This is an important 
factor that did not receive much attention in previous research. 
In the following, we give a more quantitative analysis. 


Take the storage complexity of an H-matrix as an example, 
which is also the complexity of an H-matrix based matrix- 
vector multiplication. In an H matrix, since each admissible 
block G”'*"* has a factorized form Am, xk: By, xg, With rank 
ki, the storage is reduced from the conventional full-matrix 
based m; x n; units to ki(m; + ni) units. By summing up the 
storage of all the admissible blocks, we obtain 


nk 
Storage = LD ki(mi + ni), (7) 


i=1 


where nk is the total number of admissible blocks. The above 
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Fig. 4: Comparison between the storage of the rank (0.5kmaz- 
kmax) blocks and that of the other blocks for a cone-sphere 
from xa = 2 to Ka = 600. 


can be evaluated as 
p nki 
Storage = 5 5 ki(mi + ni), (8) 
1=0 i=1 
where / is tree level, p is tree depth, and nk; is the number 
of admissible blocks at level /. 

Clearly, as can be seen from (7) and (8), if we use 
the maximal k among all the admissible blocks, knox, to 
bound the computational complexity, we will overestimate the 
complexity because many admissible blocks have a rank k; 
much smaller than kmaz. Moreover, the number of admissible 
blocks at each tree level, nk ;, also plays an important role in 
determining the computational complexity. Existing admissi- 
bility condition based H-partition is geometry based. It is not 
optimized for a given accuracy and a given frequency. There 
exists a large space to optimize the H-partition to reduce nk; 
without sacrificing accuracy. 


A. Proposed definition of average partition rank, kave, for the 
H-based computation of electrodynamic problems 

From the aforementioned analysis, the cost of an H-based 
computation of high-frequency problems is determined not 
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Fig. 5: Comparison of two average partition rank for a 3D 
PEC plate with electric size from 2 to 30 wavelengths. 


only by the block rank k; but also by the number of admissible 
blocks at each tree level, i.e. H-partition. In light of this fact, 
we propose a new parameter, average partition rank kave, 
in order to analyze the computational cost of an H-based 
computation of high-frequency problems in a more objective 
way. We define average partition rank kaye as the following: 
we ET) 
where nk is the total number of admissible blocks, k; is the 
rank of the i-th admissible block, m; and n; are the number 
of rows and columns of the admissible block. From the above 
definition, it is clear that kave is not only related to the block 
rank k;, but also related to the H-partition. Different partitions 
can result in different kave. That is why we name kave as 
average partition rank to distinguish it from the block rank that 
does not contain any information about partitions. Different 
from maximal block rank knox, the kaye can be reduced by 
changing H-partition, which will become clear by numerical 
results shown in Section IV. 
Similarly, we can define an average square partition rank 
as the following. 


kave = (9) 


Tek (rm + 76) 
Den (m; + m) 
From (10) and (9), it can be seen that the (k?)ave is an 
O(k2 e) quantity. As an example, in Fig. 5, we plot kave and 
/(k?) ave for a 3D conducting plate from 2 to 30 wavelengths. 
It can be seen that these two average rank follow each other 
closely. 


(k?)ave = (10) 


B. Parameters that can be used to reduce the cost of H-based 
computation of electrodynamic problems 
By using (9), the storage complexity (7) can be written as 
nk 


Storage = kave 5 (Mi + ni), 


i=1 


(1) 


The summation of the m; and n; over all the admissible blocks 
can be evaluated as the following: 


nk p nk 


XO (mi +7) < DiG x 2) = ee x 2), (12) 
l=0 


i=1 l=0 i=1 


in which we use the fact that the row/column dimension of a 
block at level l is z, where l = 0 represents the root level. 
The above should be evaluated based on actual nk;, which is 
problem dependent and partition dependent. Substituting (12) 


into (11), we obtain 


a N 
Storage < kave 5 niki (sr x 2). 
1=0 


(13) 


Thus, it becomes clear how to reduce the storage of H-based 
computation of electrodynamic problems: we should reduce 
kave and nk. 

To analyze the CPU time cost, take the matrix-matrix 
multiplication as an example. The cost for each admissible 
block involved in the multiplication is Cspki (mi + ni) ( [16], 
pp. 127-130). By summing up the cost of all the admissible 
blocks across all the tree levels, we obtain 

p nk 
Operation counts = C's, 5 5 ki? (mi + ni) 
1=0 i=1 


p 
< Cop X (k?)ave X. (mi + ni) (From (10)) 





+nj) 


P P 
< Copkžve 5 5 nki(m + ni), 


where p is the tree depth that is proportional to log N. In 
the last inequality, we use the fact that (k”)ave is an O(k?,,.) 
quantity, as shown in Section III-A. Again, from the above, it 
can be seen that by reducing kave and nk;, we can reduce the 
cost of an H-based computation. It is worth mentioning that 
the above analysis of storage and time cost is valid for any 
H-partition. 


(14) 


IV. PROPOSED METHODS FOR REDUCING THE 
COMPUTATIONAL COST OF H-MATRIX BASED 
DIRECT SOLUTION OF ELECTRODY NAMIC 
PROBLEMS 


From (13) and (14), it can be seen that to reduce the 
computational cost of an H-based computation of electrody- 
namic problems, we have to reduce kave and nk;. In next two 
subsections, we propose methods to minimize kave for a given 
partition and optimize H-partition to minimize nk; based on 
accuracy requirements. The number of admissible blocks at 
each tree level, nkj, can be qualitatively measured by C'sp, 
which is the maximal number of blocks that can be formed 
by a cluster. 


A. Proposed method for minimizing average partition rank 
kave based on a prescribed accuracy for a given H-partition 


Given an H-partition, to minimize kaye, we determine 
a minimal rank based on a prescribed accuracy for each 
admissible block. The reason is obvious. If each admissible 
block has a minimal rank, the resultant average partition rank 
kave for the given H partition is also minimized. 


Given an accuracy requirement, singular value decomposi- 
tion (SVD) is the most accurate method to obtain the minimum 
rank that can meet the accuracy requirement for an admissible 
block. However, if we directly apply SVD to the original 
full matrix to obtain its H-matrix representation, although the 
resultant rank is minimal, the computational cost is high. 

On the other hand, we can use interpolation, the Taylor 
expansion, and ACA-based approaches to efficiently convert 
a full matrix block to an -matrix representation. However, 
the resultant rank is, in general, not the minimal one that is 
necessary to satisfy the accuracy requirement. In other words, 
the resultant rank can be much larger than what is necessary 
to satisfy a prescribed accuracy. 

In this paper, we first use ACA+ ( [16], pp. 71-74), which 
is a variant of ACA [19], to efficiently compute an H- 
matrix representation. We then apply SVD to the H-matrix 
representation to determine the actual rank that is needed to 
satisfy the accuracy requirement [17]. By doing so, we keep 
the advantages of both SVD and ACA-based methods. The 
resultant rank is minimal, and meanwhile it is obtained in 
linear complexity for each block. In the following, we give 
more details of the proposed approach. 

First, we use ACA+ to numerically obtain a factorized 
form of an admissible block. The ACA+ involves less storage 
and computational cost than ACA. The detailed procedure of 
ACA+ is very similar to that of the conventional ACA. The 
difference between them is as follows. At the beginning of an 
ACA+ algorithm, a reference row and a reference column of 
the original matrix are chosen to determine where to start the 
pivot search. A row and column pivot index is then determined 
from the reference ones. In the subsequent steps, the reference 
row and column can still be used. But if they are chosen as a 
pivot index, a new reference row and a new reference column 
has to be chosen. This method only requires assemble k rows 
and k columns of an admissible block, where k is the rank 
determined by a certain accuracy requirement e. The output 
of an ACA+ algorithm is qe" = Am,kBz, x» where k is, 
in general, much less than m and n. The ACA+ algorithm 
terminates when 


e- G] = ||e - aB"|| < e Gl (15) 


is satisfied. Therefore, the error of the resultant H-matrix 
representation is bounded by e. 

After the ACA+ is completed, we obtain a factorized form 
Am, Bh, k- For such a factorized form, SVD can be efficiently 
performed by a reduced SVD ( [16], p. 108). The resultant 
computational cost is O(k?(m + n)), which is linear. 

To test the effectiveness of the proposed approach to 
determining the minimal rank of an admissible block, we 
simulated a 3D conducting plate of 10 wavelengths, and a 
3D conducting sphere of 8 wavelengths, respectively. The e in 
ACA+ and SVD was set to be 1074. In Fig. 6, we plot the rank 
distributions with the proposed scheme (ACA+ and SVD) and 
with ACA+ only in the lowest H-partition level that has an 
admissible block. This level is the closest to the root of a block 
cluster tree, and hence the admissible blocks therein have 
the largest electric size. The horizontal axis of Fig. 6 is the 
admissible block index. It can be seen from Fig. 6 that by using 
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Fig. 6: Rank distribution in the lowest level of an H-partition 
generated with € = 1074. (a) A 10 X perfect electrically 
conducting (PEC) plate.(b) An 8 A PEC sphere. 


the proposed method, the rank in most admissible blocks is 
reduced by half. Moreover, the same accuracy is achieved. The 
|e- ē| /16 
norm is used. The accuracy is 1.021689 x 107 without the 
proposed rank minimization, and 1.079537 x 1075 with the 
proposed minimization for the plate example. Clearly, the rank 
is reduced without sacrificing accuracy. The same is true for 
the sphere example. The accuracy is 1.931219 x 1075 without 
the proposed rank minimization, and 1.996355 x 1075 with the 
proposed minimization. In addition, with the proposed method, 
the new rank distribution becomes more uniform across all the 
admissible blocks as can be seen from Fig. 6. 


accuracy is measured by , where a Frobenius 











B. Proposed method for optimizing H-matrix partition to 
minimize the number of admissible blocks at each tree level 


Existing H-matrix partition is based on the admissibility 
condition as shown in (5). It is not an optimal one for 
frequency dependent kernels. This can also be understood from 
the fact that the admissibility condition (5) is frequency inde- 
pendent. Physically speaking, whether the interaction between 
two regions can be represented by a low-rank block or not 
is frequency dependent. For example, for a given accuracy, it 
is possible that an off-diagonal block that is inadmissible at 


certain frequencies becomes admissible at a lower frequency. 
It is also possible that multiple small admissible blocks can 
be merged into a single admissible block with prescribed 
accuracy. In other words, they start to become admissible 
in a lower level of an inverted block cluster tree when the 
frequency changes. However, the admissibility condition given 
in (5) is empirical instead of theoretical. It is controlled by an 
empirical parameter 7, instead of a prescribed accuracy. 

In the following, we show our proposed algorithm that can 
optimize an H-matrix partition based on a prescribed accuracy. 
The resultant H-matrix partition is frequency dependent. It 
significantly reduces the number of admissible blocks at each 
tree level, with the prescribed accuracy satisfied. 

The proposed H-matrix partition algorithm for a prescribed 
accuracy €op¢ is shown in (16). A new error tolerance €opt iS 
introduced instead of reusing e€ in (15) to facilitate separated 
accuracy control of the H-partition optimization and the rank 
minimization for each admissible block. In (16), the original 
partition given by (5) is used as an initial guess from which 
we construct an optimized partition. 





H-Partition Optimization 
Procedure H — Popt(P, €opt) 
(Input P is the original H partition, 
output P is overwritten by an optimized H partition) 
If P is a non-leaf off-diagonal matrix block 
for @=0;1 < 4;i + +) 
if P(i) is an inadmissible block 
Rk_Factor(P (i), €opt) 
end if 
if P(i) is a non-leaf block 
H — Popt(P (i), €opt) 
end if 
end for 
If all blocks in P are admissible blocks 
Merge_Rkblocks (P, copt) 
end if 


end if (16) 





In (16), the function Rk_Factor is to factorize a full- 
matrix block to a rank-k matrix shown in (6) based on a 
prescribed accuracy. The procedure is the combined ACA+ 
with SVD, which is detailed in section IV-A. The function 
Merge_Rkblocks is to merge multiple small admissible blocks 
to a single one based on the prescribed accuracy. To give an 
example, four admissible sub-blocks can be merged into one 
admissible block as follows. 


| Ai(fi)Bi(fi)” A2(f2)B2( f2)" | = 
A3(f3)B3(fs)" Aa(fa)Ba(fa)” 


PPPE T. atl 


| aoh) | | ga J + | Aa(fa | | B.C) | = 


Aa (FOBA) + Äalfa)Bal d) + As(fs)Ba(fz) + 


Au(fs)Ba(fa) “22 ABT, (17) 
where the f;(¢ = 1,2,3,4) denotes the electric size the blocks 
are associated with, and the addition in the final step is carried 
out by the truncated addition operation in ( [16], p. 110), with 
the new rank k determined based on the accuracy requirement 
€opt- Different from static problems, in electrodynamic prob- 
lems, when the size of the admissible block increases, its rank 
also increases in general. Consequently, although each merging 
operation reduces four admissible blocks to one block, it 
also increases the rank of the resultant block. Therefore, we 
should check which one is computationally more efficient: 
merging or not merging. This can be assessed by comparing 
the storage requirement or computational operations of the 
merged block with that of the four children admissible blocks. 
If the former is less than the latter, we perform merging; 
otherwise, we do not perform merging, instead we keep the 
original four blocks. To be more specific, we check whether 
kmerge(Mmerge + merge) < Di kj(m;j + nj) is satisfied 
or not, where kmerge is the rank of the big block resulting from 
the merging operation, Mmerge(Nmerge) is the row(column) 
dimension of the block, and k; is the rank of the children 
admissible blocks. If it is satisfied, we merge blocks based 
on accuracy requirements; if not, we keep the original blocks. 
By doing so, the cost of storage and computation based on 
the resultant new H matrix is minimized for the prescribed 
accuracy at each frequency. 

In (16), we only perform two basic operations: making an 
inadmissible block in the off-diagonal part admissible based 
on a prescribed accuracy via function Rk_Factor, or merging 
small admissible blocks to be a big admissible block based on 
the accuracy requirement through function Merge_Rkblocks. 
Both operations do not increase the number of inadmissible 
blocks. In fact, the number of inadmissible blocks is even 
reduced due to the first operation. Therefore, given an H- 
partition, the proposed optimization algorithm does not in- 
crease the number of inadmissible blocks. Thus, if the number 
of admissible blocks is reduced, the total number of blocks is 
also reduced. 

To validate the effectiveness of the proposed H-partition 
optimization algorithm, we simulated a conducting plate from 
2A to 60A, the number of unknowns of which is from 1,160 
to 1,078,800. The €,, was chosen as 1078. In Fig. 7, we plot 
the maximum number of admissible blocks that can be formed 
by one cluster in a block cluster tree (Caa) in the original H- 
partition, and that in the optimized H-partition. Clearly, the 
Caa is reduced significantly. Since the proposed partition 
optimization does not increase the number of inadmissible 
blocks as analyzed above, the C's, is also reduced significantly. 


When the H-partition changes, the average partition rank 
kave also changes. Thus, in addition to Csp, we need to 
examine the product of kave and C's, to assess the success 
of the proposed H-partition optimization algorithm, since it is 
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Fig. 7: Caa versus N in the simulation of a conducting plate 
from 2A to 60A. (a) Original H-partition. (b) Optimized H- 
partition. 


kaveCsp that determines the storage and time cost as can be 
seen from (13) and (14). In Fig. 8(a), we plot kaveCaa for 
the plate example simulated above with respect to Ka, where 
k is the wave number, and a is the side length of the plate. 
We compare the kaveCaa generated by the proposed method, 
and that generated by the conventional method which is based 
on an ACA-based rank scheme and an admissibility condition 
based H-partition. It is clear that the proposed method greatly 
reduces kaveCaa. In Fig. 8(b), we plot kaveCaa for a sphere 
of diameter a with respect to xa from 2 A to 45 A. Again, 
kaveCaa is greatly reduced by the proposed method. Since 
the proposed H-partition optimization does not increase the 
number of inadmissible blocks as analyzed above, the kaveC'sp 
is also reduced significantly. 

Although C'sp is a parameter that can be used to qualitatively 
measure the number of blocks formed by an H-partition, it 
does not give a quantitative measurement of the number of 
blocks obtained by the H-partition. To provide a quantitative 
analysis, in Fig. 9(a), we plot the exact number of blocks 
with respect to tree level generated by the original H-partition 
which is based on the admissibility condition in comparison 
with the number of blocks generated by the proposed partition, 
for a 15X sphere. In Fig. 9(b), we plot the same for a 40A 
plate. Clearly, compared to the original partition, the proposed 
partition significantly reduces the number of blocks at each 
tree level, and hence significantly reducing the computational 
cost. In addition, across the entire tree depth, it is observed 
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Fig. 8: Comparison between kayeC'sp generated by conven- 
tional method and that generated by the proposed method. (a) 
A PEC plate from 2 A to 60 A. (b) A sphere from 2 A to 45 
À. 


that the number of blocks is not necessarily reduced by half 
when one ascends the inverted tree by one level from leaf level 
l = p to root level l = 0, which can be seen from both the 
conventional partition and the proposed partition. 


C. Study on the dependence of kave and Csp of the proposed 
methods with respect to scatterer shape 


We varied the scattering shape continuously from plate, 
Sierpiski gasket, cylinder, open cone, cone sphere, to sphere. 
We plotted the maximal rank kmay versus electric size for 
these scatterers. For comparison, we also plotted the average 
partition rank kave obtained by the proposed methods for the 
same accuracy. As can be seen from the figures in the left 
column of Fig. 10 and Fig. 11, for all these scatterers, kave is 
much smaller compared to kmax. In addition, the rate of the 
change of kave with respect to electric size is much slower than 
that of kmag. Theoretically speaking, this is because kave is 
not only determined by the rank of each admissible block, but 
also determined by the number of admissible blocks at each 
tree level, and hence the H-partition. 

In the right column of Fig. 10 and Fig. 11, we plot the 
number of admissible blocks obtained by the proposed method 
with respect to electric size for a variety of scatter shapes. The 
dependence of the Csp on the scatter shape is shown to be 


little. In addition, with proposed methods, both kaye and C's, 
are minimized to be small compared to N. 
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Fig. 10: The kave and C’,, generated by the proposed method with respect to electric size for a variety of scatterer shapes. 
(a-b) Plate. (c-d) Sierpiski gasket. (e-f) Cylinder. (g-h) Open cone. 


D. Study on the dependence of kave and Csp on accuracy 
requirements 


We also tested the dependence of kave and Cy, with 


respect to accuracy. The results in Fig. 10 and Fig. 11 were 
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Fig. 11: The kave and C's, generated by the proposed method with respect to electric size for a variety of scatterer shapes 


(Continued from Fig. 10). (a-b) Cone sphere. (c-d) Sphere. 


generated based on € = 1074 and €opt = 1073, where e 
is a parameter shown in (15) for controlling the accuracy 
of rank reduction, and €,,; is a parameter shown in (16) 
for controlling the accuracy of the H-partition optimization. 
To test the dependence with respect to accuracy, we set 
e€ = 1077 and €opt = 1076. We used a plate and a sphere 
as examples, and tested the dependence of kaye and Cp 
generated by the proposed methods with respect to accuracy 
across a wide range of electric sizes. As can be seen from 
Fig. 12, with the accuracy requirement increased, kave and 

(k?) ave increase. However, the increase is small compared 
to the three orders of magnitude increase in accuracy. In 
addition, the dependence of kaye and \/(k?)ave with respect to 
electric size is similar to what is observed for a lower order of 
accuracy. In addition, with the accuracy requirement increased, 
the number of admissible blocks also increases. Again, the 
increase is minor compared to the three orders of magnitude 
increase in accuracy. Moreover, the frequency dependence of 
C'sp is also similar to that for a lower order of accuracy. 


V. ON THE EXISTENCE OF THE H-MATRIX 
REPRESENTATION OF G`! AND G’s LU FACTORS 


In this section, we numerically prove the existence of an H- 
matrix representation of the inverse of G and G’s LU factors 
by examining their rank distributions. The detailed procedure 
is as follows: we directly compute G~* and G’s LU factors 
without introducing any approximation; we then use SVD to 
obtain the rank distribution in G~* and G’s LU factors. Since 
the computation requires a complete full-matrix form of G7! 


and G’s LU factors, it is not practical to use a problem having 
a large electric size as an example. We thus considered a 
conducting plate of 6 and a conducting sphere of 4A. We 
computed the rank of each off-diagonal block that satisfies 
the admissibility condition (5) at the lowest level of a block 
cluster tree. The accuracy requirement in SVD is set to be 
1074. In Fig. 13, we plot the rank distribution measured by 
k/min(m,n) with respect to the matrix block index. It is 
clear that the off-diagonal blocks that satisfy the admissibility 
condition in G7! and G’s LU factors are low rank. Therefore, 
not only the original matrix G can be represented by an H- 
matrix, but also G7} and G’s LU factors can be represented 
by an H-matrix. Furthermore, the rank distribution of G7! 
and G’s LU factors is very similar to that of G. In addition, 
both G7} and G’s LU factors have a smaller rank than the 
original matrix, with the rank of G’s LU factors being the 
smallest. This suggests that the H-matrix partition constructed 
for the original matrix G is equally applicable to G~* and 
G’s LU factors. In Fig. 13, U’s rank distribution is plotted to 
represent L’s and U’s rank distribution since these two have 
a very similar rank distribution, with U’s rank being slightly 
larger. 


VI. PROPOSED FAST IMPLEMENTATION OF LU 
FACTORIZATION 


In this section, we show how to perform a fast LU factor- 
ization using the H-matrix based representation of G and G’s 
LU factors. Based on the findings in the section above, we 
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Fig. 12: The dependence of kaye and C's, generated by the proposed method with respect to accuracy requirements over a 
wide range of electric sizes (Accuracy setting 1: € = 1074 and €opt = 107°; Accuracy setting 2: € = 1077 and ope = 107°). 


(a-b) Plate. (c-d) Sphere. 


use the H-partition constructed for G for the H-partition of L 
and U. 

The H-based LU factorization has been discussed in ( [16], 
p. 119). However, no detailed implementation is given. In the 
following, we give a number of pseudo-codes to show a fast 
implementation of #-based LU factorization, which is not 
reported anywhere else. It is worth mentioning that [6] did 
an ACA-based LU factorization, the procedure of which is 
very different from the proposed one. 


A. LU factorization basics 
Given an IE-based system matrix G, we cast it into a form 


c= |G alt 


18 
Ga Ga (18) 


The LU decomposition can be recursively computed by the 
Ui2 | 


following equation: 
Lu 0 | | Un 
Lo, Loe 0 Uz 
L 


Gi Gir 
G = 
| Goi Gr | 
U. (19) 


B. Proposed fast implementation of the LU factorization 


We developed a pseudo-code shown in (20) to recursively 
perform LU factorization. 





LU-Decomposition G=LU 
Procedure H-LU(G) 
(G is the input matrix overwritten by L and U) 
If G is a non-leaf block 
H-LU(G11) > Li, U11, 
Solve_LX(Li1, Giz) > U12, 
Solve_XU(Go1, U11) > Lai, 
—Lai x Ui2 + G22 > G22, 
H—LU(G22) > L22, U22, 
else 


Full-LU(G) (20) 





The underlying algorithm is as follows. When G is a non-leaf 
matrix block, we recursively call (20) until Gj; is a full 
matrix block. We then directly compute the LU factors of 
the G,, using a full-matrix-based LU factorization, which 
generates L,, and U,,. Next, we call function Solve_LX 
shown in (21) and Solve XU to compute Uj2, and Lo 
respectively. 





Algorithm for Solving a Lower Triangular System 
LX = G, with G being an matrix 
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Fig. 9: Comparison between the number of blocks generated 
by the conventional geometry-based partition and that by the 
proposed optimal partition with respect to tree level. (a) A 15A 
sphere. (b) A 40A sphere. 


Procedure Solve_LX(L,G) 
(L and G are input matrices, G is overwritten by X) 
If L is a non-leaf block 
If G is a non-leaf block 
Solve_LX(Li1,Gii), Solve_LX(Li1, G12) 
—Lai x G11 + G21 —> Gai, Solve_LX(L22, G21) 
—L21 x G12 + G22 —> G22, Solve_LX(L22, G22) 
else if G is an admissible block 
Solve_LF(L, A) 
else 
Solve_LF(L, G) 
end if 
else 
Full_LX(L, G) 
(Solve a full-matrix triangular system) 
end if 





(21) 
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Fig. 13: Rank distributions in G, G, and G’s LU factors. 
(a) PEC plate. (b) PEC sphere. 





is to solve an upper triangular system, which can be derived 
in a similar fashion as (21). In (21), a function Solve_LF is 
called. Similar to Solve _LX, Solve_LF also solves a triangular 
system. The difference is that the right-hand-side matrix 
for Solve_LX is an H matrix, whereas that for Solve _ LF 
is a full matrix. The pseudo-code of Solve_LF is given in (22). 








The Solve_LX(L,G) is to solve a lower triangular system 
LX = G, where L and G are input matrices having H- 
representations, and X is the solution. The Solve_XU(G, U) 


Algorithm for Solving a Lower Triangular System 
LX = F, with F being a Full Matrix 
Procedure Solve_LF(L,F) 
(L and F are input matrices, F is overwritten by X) 
If L is a non-leaf block 
Solve_LF(L11, F1) 
—Le x Fi + Fo > F2 
Solve_LF(L22, F2) 
else 
Full_LX(L, F) 
(Solve a full-matrix triangular system) 


end if (22) 





In the final step of (20), we use Uj2 and Lə; to update G22, 


and then call (20) recursively until Lə2 and U22 are computed. 
As can be seen from (19) to (22), efficient LU factorization 
relies on efficient block multiplication and block addition. In 
next subsection, we show how to efficiently perform these two 
operations for a prescribed accuracy. 


C. Fast implementation of the block multiplication G = 
We give a pseudo-code of computing G? =G" x G” in 
(23). 





Recursive Multiplication Algorithm 
Procedure H-mult(G"?, Gg, G?, ceLU) 
If (G°!, G??, G? are all non-leaf blocks) 
forli =0;i<2;i+ +) 
for(j =0;j < 237+ +) 
for(k =0;k <2;k+ +) 
H-mult(G?! (i, k), G? (k, j), G? (i, j) 
else if (G? is a non-leaf block, 
G?! or G”? is a leaf block) 
Multiply_RK(G"!, G??,Õ’) 
G GtG 
else if G? is an admissible block 
Multiply_RK(G"', G°?, G’) 
else if G? is an inadmissible block 
Multiply_Full(G”', G°?, G’) 
end if 





(23) 





where, 6, b,, and bə represent three blocks in the same level 
of an H-partition, epy represents a prescribed accuracy. If 
GÈ, (eL and G”? are all non-leaf blocks, we recursively call 
(23). If one of G?! and G” is a leaf block, or G? is an 
admissible block, we call function Multiply_Rk shown in 
(24) to compute an admissible product. In (23), the addition 
is performed based on the prescribed accuracy egy, which 
is denoted by “£". The detailed procedure of the addition is 
given in the following subsection. 





Procedure Multiply_RK(G”' ; Ge G €LU) 
if G’! and G” are both non-leaf blocks 
for(i =0;i < 2;i + +) 
for(j =0;j < 2;j ++) 
for(k =0;k < 2;k + +) 
Multiply_RK(G"!(i, k), G? (k, j), © (i, j) 
G? G 
(Č is a non-leaf block) 
else if G°! or G”? is an admissible block 
GAB? > (G"A)BT = A,B’ = 6" 
(G’ is an admissible block) 


G&G +e! 

else if G’ or G? is an inadmissible block 
G''G? = GIF P (G"ayB? = A,B" = 6 
GG +e 


end if (24) 





In (24), there are two multiplication cases. One is to 
multiply an admissible block G’! by an admissible block of 
an ABT form, for which we can compute G?'A as a new 
A. The other multiplication case is to multiply G?! by a full 
matrix block F, for which we can first apply SVD to F to 
generate a form AB’ based on the prescribed accuracy ezu. 
If G? is a full matrix block, a normal full matrix multiplication 
is computed. The additions in (24) again are performed based 
on ELU. 


D. Fast implementation of the block addition G? = G?! + G? 


Two cases are involved in the addition operations. 

Case 1: If GÈ, G; and GË? have the same H-partition, 
the addition can be done using the following procedure. (a) 
If three blocks are all full matrices, we simply add two full 
matrices up. (b) If three blocks are all admissible matrices, 
for example, Gi = An BE with rank ky, G? = AB% 
with rank kj, and G? = A,B7, the G? = G?! + GË? can be 
realized by a truncated addition operation using the approach 
shown in ( [16], p. 110). The rank k of the resultant G? is 
adaptively determined by the prescribed accuracy ezy. (c) If 
three blocks are all non-leaf blocks, the addition can be carried 
out by summing over all the inadmissible blocks using (a), and 
all the admissible ones using (b) respectively. 

Case 2: If the three blocks do not share the same partition, 
we convert the H-matrix partitions of G’! and G” both into 
the partition of G?. Take the block G?! as an example. If G?! 
is an admissible block but G? is a non-leaf block that has 
four admissible subblocks, we convert G?! by the following 
formula 


Gg = 
zs ~ T 
A B 
pr _| At Bi | 
oe | A2 | | B2 | 
~ -T ~ =T 
AiB, AiB, _ a” (25) 
~ ~T ny ~T Arie . 
ÀB. AGB, 








where Gg” contains four admissible sub-blocks, which is 
exactly equal to G’! The opposite procedure, where G? is 
an admissible block while G?! contains four admissible sub- 
blocks, can be performed by the scheme shown in (17). 


VII. TOTAL COMPUTATIONAL COST ANALYSIS 


Three numerical procedures are involved in the proposed 
direct IE solver: rank minimization, H-partition optimization, 
and LU-based direct matrix solution. We analyze the compu- 
tational cost of each in the following. 


The cost of the rank minimization scheme for each admissi- 
ble block, described in section IV-A, is linear. This is because 
both ACA+ and reduced SVD have a linear cost for each 
admissible block [16], [19]. 

For the H-partition optimization shown in (16), two basic 
operations are involved. One is the factorization of inadmis- 
sible blocks. This operation is carried out by the function 
Rk_Factor based on ACA+ and SVD. The other is the conver- 
sion of non-leaf blocks into an admissible block shown in (17). 
This operation is carried out by the function Merge_Rkblocks 
by using an SVD based truncated addition. Since both ACA+ 
and SVD have a linear cost for each matrix block, the two 
basic operations involved in (16) also have a linear cost for 
each matrix block. 

From procedure (20), it can be seen that at the leaf level, 
the computation of the recursive LU factorization essentially 
includes a full-matrix LU factorization, a full-matrix solution 
of a lower triangular system, and a full-matrix solution of 
an upper triangular system, all of which have the same 
computational cost as a full-matrix block multiplication. At all 
the other levels, a number of block-block multiplications are 
computed, which have the same recursive pattern as that in an 
H-based matrix-matrix multiplication. Therefore, the H-based 
LU factorization has the same cost as H-based multiplication, 
which is shown in (14). The H-based LU solution has the 
same cost as the H-based matrix-vector multiplication, and 
hence storage, which is derived in (13). The proposed methods 
minimize kaveC'sp, and hence kave and nk; based on accuracy 
to reduce the computational cost. 


VII. NUMERICAL RESULTS 


To test the performance of the proposed direct IE solver, we 
simulated a PEC (perfect electrically conducting) plate, a PEC 
sphere, and a PEC cylinder from a small number of unknowns 
to over 1 million unknowns, from small electric sizes to over 
95 wavelengths. In all these examples, 7 = 1 and leafsize = 32 
were used. Note that 7 = 1 was used to generate an initial 
H-partition, which was later replaced by the optimized H- 
partition by the method proposed in Section IV-B. The error 
tolerance € 5p; used in the H-partition optimization was set as 
10-3. The error tolerance ezy used in the LU factorization 
was 1072. The computer used was a Dell’s PowerEdge 6950s 
server with 8222SE AMD Opteron processors. Double preci- 
sion was employed in all the simulations. 

First, we tested the accuracy of the H-matrix representation 
obtained by the proposed rank minimization and partition op- 
timization methods. The error of the H-matrix representation 
is measured by |G- G| /||G||, where G is the H-matrix 
representation of the original G, and the Frobenius norm is 
used. We simulated a PEC plate from 2 A to 14 À, and a PEC 
sphere from 2 A to 8 A. Fig. 14 shows the accuracy of the 
H-matrix representation generated from the proposed method. 
Excellent accuracy can be observed in the entire range of 
electric sizes. Since the assessment of the matrix error requires 
the knowledge of the full matrix G, we did not simulate larger 
problem sizes in this testing case. 
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Fig. 14: Accuracy of the proposed H-matrix representation. 
(a) PEC plate from 2 A to 14 A. (b) PEC sphere from 2 A to 
8 A. 


Next, we tested the accuracy and efficiency of the proposed 
direct LU based IE solver. The first example is a conduct- 
ing sphere illuminated by a normally incident plane wave. 
The electric size of the sphere is from 2 A to 45 A. The 
discretization results in unknowns from 3,688 to 1,152,368. 
The average partition rank kaye resulting from the proposed 
rank minimization and H-partition optimization methods is 
shown in Fig. 15(a) in the entire range of electric sizes. It 
can be seen that kave is minimized to be a small number 
compared to N. In Fig. 15(b), we plot the memory cost of 
the proposed solver. In 15(c) and (d), we plot the CPU time 
of the LU decomposition, and LU solution, respectively. It is 
clear that the memory and CPU cost of the proposed direct 
solver scale more favorably with NV compared to conventional 
direct solvers. To test the accuracy, in Fig. 16(a) and (b), we 
plot the E-plane bi-static RCS simulated for 12 À and 26 A, 
respectively. The RCS generated with ezy = 107° is shown 
to agree very well with the analytical Mie-Series solution. In 
Fig. 16(c), we plot the solution error. Less than 6% error is 
observed in the entire frequency band. 

The second example is a 3D conducting plate, the electric 
size of which is from 2 A to 60 à. The discretization results in 
1,160 unknowns to 1,078,800 unknowns. The average partition 
rank kave resulting from the proposed rank minimization and 
partition optimization methods is shown in Fig. 17(a). It is 
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Fig. 15: Simulation of a PEC sphere from 2 A to 45 A with unknowns from 3,688 to 1,152,368. (a) Average partition rank 
kave. (b) Memory requirement. (c) LU factorization time. (d) LU solution time. 


clear that the average partition rank is minimized to be a small 
number, in addition, for the plate example, it is controlled 
to be almost a constant in the entire range of electric sizes. 
The C'sp for this example can be seen from Fig. 7(b). In Fig. 
17(b), we plot the memory cost of the proposed direct IE 
solver. In Fig. 17(c), we plot the solution error of the proposed 
LU solver measured by G1 — v| / ||V||. Good accuracy can 
be observed. In addition, the accuracy is kept to be almost a 
constant in the entire range. From Fig. 17(d) to (f), we plot the 
CPU time of the H-matrix construction, LU decomposition, 
and LU solution, respectively. The computation for the 60 A 
case having over | million unknowns was finished within 10- 
hour LU decomposition time, 55-second LU solution time, and 
costing 31.5 GB storage only. 


The last example is a conducting cylinder, the length of 
which is from 2 A to 96 À. The ratio of length to radius is 20. 
The number of unknowns is from 1,391 to 1,075,200. In Fig. 
18(a), we plot the average partition rank kave resulting from 


the proposed rank minimization and partition optimization 
methods in the entire electric-size range. In Fig. 18(b), we plot 
Caa Versus the electric size. Without the proposed H-partition 
optimization, the Cg is between 47 and 51. Clearly, Caa, and 
hence C'sp is reduced greatly. In Fig. 19(a), we plot the solution 
error with respect to the number of unknowns. Good accuracy 
is observed. Note that the error is controllable by €, €opt, 
and ezu. If better accuracy is required, it can be obtained by 
reducing the error tolerances. From Fig. 19(b) to (d), we show 
the computational cost of the proposed direct LU solver in LU 
factorization, LU Solution, and storage. The LU factorization 
for the 1,075,200 unknown case costs less than 20 hours and 
37.6 GB memory. The LU solution time is 85 seconds only. 
Compared to results obtained for similar examples reported in 
open literature, the proposed direct IE solver is much more 
efficient in both CPU time and memory consumption, even 
though double precision was used for computation. 

The above direct matrix solutions were all generated based 
on € = 107+, €op¢ = 1073, and czy = 1077. To test the 
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Fig. 16: (a) RCS of a conducting sphere simulated by the proposed solver at 12 A. (b) RCS of a conducting sphere simulated 
by the proposed solver at 26 A. (c) Solution error in the entire frequency band from 2 A to 45). 


performance of the proposed direct solver for a higher order 
of accuracy, we set e = 1075, copt = 1074, and ezy = 107°. 
We simulated the same plate example simulated in Fig. 17. 
In Fig. 20(a), we plot the solution error with respect to the 
number of unknowns. An excellent accuracy can be observed 
in the entire range of electric sizes, where the worst error is 
shown to be less than 0.33%, in comparison with the 3% error 
achieved by the previous accuracy setting. From Fig. 20(b) to 
(d), we show the computational cost of the proposed direct 
LU solver in memory, LU factorization, and LU solution with 
respect to N. 


IX. CONCLUSION 


In this work, we show that the cost of an H-based com- 
putation of high-frequency problems is determined not only 
by the block rank k; (the rank of each admissible block) but 
also by the number of admissible blocks that have rank ki, 
and hence H-partition. By optimizing the H-partition based 
on accuracy for each frequency to minimize the number of 
admissible blocks at each tree level, the computational cost of 
an H-based computation of high-frequency problems can be 
significantly reduced. We show that there exists a large space 


to optimize the H-matrix partition for frequency dependent 
problems. Existing H-matrix partition is based on a geometry- 
based admissibility condition. This condition is controlled by 
an empirical parameter instead of a prescribed accuracy. The 
resultant partition is not optimal, especially for electrodynamic 
problems. We hence develop a new H-partition method that is 
frequency dependent, and also directly controlled by accuracy 
requirements. With the proposed new partition, the number of 
admissible blocks at each tree level is significantly reduced 
compared to that generated by the conventional geometry 
based H-partition, thus leading to a significant reduction in 
computational cost. 


In addition, the actual number of the block rank should be 
determined and minimized based on accuracy requirements. 
This paper provides an efficient matrix algebra based method 
to perform this task with negligible computational overhead. 


Moreover, we developed an efficient H-based LU- 
factorization for directly solving the dense system matrix 
resulting from an IE-based analysis of large-scale electrody- 
namic problems. The dense matrix involving over | million 
unknowns formulated for a 96-wavelength problem is fac- 
torized in fast CPU run time, modest memory usage, and 
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Fig. 17: Simulation of a PEC plate from 2 A to 60 X. (a) Average partition rank. (b) Memory cost. (c) Solution error. (d) 
H-matrix construction time. (e) LU factorization time. (f) LU solution time. 


with prescribed accuracy satisfied. The proposed methods 
for minimizing the rank and optimizing the -partition not 
only can be used in the proposed direct solver, but also can 
be employed in other fast integral equation solvers for both 
iterative and direct solutions. 
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Fig. 18: Simulation of a PEC cylinder from 2 A to 96 A. (a) kave versus N. (b) Cag versus N. 
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Fig. 19: Simulation of a PEC cylinder from 2 A to 96 A. (a) Solution error. (b) LU factorization time. (c) LU solution time. 
(d) Memory Cost. 
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Fig. 20: Simulation of a PEC plate from 2 À to 60 À based on a higher-order accuracy setting (€ = 1075, €opt = 1074, and 
eu = 107°). (a) Solution error. (b) Memory Cost. (c) LU factorization time. (d) LU solution time. 


